A Rational Indicator of Scientific Creativity 
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A model is proposed for the creation and transmission of scientific knowledge, based on the network 
of citations among research articles. The model allows to assign to each article a nonnegative value 
for its creativity, i. e. its creation of new knowledge. If the entire publication network is truncated to 
the first neighbors of an article (the n references that it makes and the m citations that it receives), 
its creativity value becomes a simple function of n and m. After splitting the creativity of each 
article among its authors, the cumulative creativity of an author is then proposed as an indicator 
of her or his merit of research. In contrast with other merit indicators, this creativity index yields 
similar values for the top scientists in two very different areas (life sciences and physics), thus offering 
good promise for interdisciplinary analyses. 
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Introduction 



Evaluating the scientific merit and potential, of tenure 
and professorship candidates, is perhaps the most critical 
single activity in the academic profession. In countries 
and institutions with a long scientific tradition, selection 
committees are generally well trained and trusted to bal- 
ance wisely the vast variety of factors that may influence 
the decision, in the sense of optimizing the long-term 
scientific output. In less established environments, deci- 
sions are frequently perceived as arbitrary, and the use 
of objective indicators and procedures may be necessary 
to obtain a wide consensus. 1 

The most traditional indicator of research output, the 
number of published papers, has been progressively sub- 
stituted by the number of citations received by those 
papers, when this impact indicator has become widely 
available and easy to obtain? 2-3 Different combinations of 
both magnitudes have been proposed^ like those in the 
SPIRES database^ The field has been recently revital- 
ized by the proposal by HirscbJi of yet another combina- 
tion, the so-called h index, which has gained a rapid pop- 
ularity, partly because the Thomson-ISI Web of Knowl- 
edge database 3 - provides a handy tool to sort articles by 
their number of citations (while it offers no tools to ob- 
tain other indicators, like the total citation count). Apart 
from that comparative handincss, there is little objective 
evidence for the relative advantages of different indexes, 
which arc generally motivated in terms of "impact" or 
"influence" . However, it must not be forgotten that the 
task of a scientist is to create useful knowledge (in its 
broadest sense), not merely to produce an impact. It 
is therefore desirable to derive some rational measure of 
the magnitude and quality of research output, rooted in 
a plausible model of the creation and transmission of sci- 
entific knowledge^ 



Creativity Model 

Basic scientific knowledge, as opposed to technological 
or industrial knowledge, is created by the minds of scien- 
tists and expressed almost exclusively as research articles. 
The knowledge is transmitted to other scientists, who 
read previous articles and acknowledge this transmission 
in the form of references (in what follows, I will call ref- 
erences of an article those made to previous papers, and 
citations those received from posterior papers). Thus, 
the output knowledge of an article comes partly from 
previous work, which is simply transmitted, and partly 
from the creation of new knowledge by the authors. How- 
ever, there are many possible reasons why references are 
mac j e| 2 4 8 j 9 4 io i p ur thermore, some of the references of an 
article may be more important than others. Thus, it 
is rather uncertain to what extent a given reference re- 
flects the use of previous knowledge. Therefore, in the 
present model I will simply assume that each reference 
reflects the transmission of a different nonnegative value 
Xij of knowledge, with probability P(xij), from the cited 
article i to the citing article j. The maximum entropy 
principle^ dictates that, in the absence of any a priori 
information, other than the average value (x) = I /a, the 
probability is given by P(x) = ae~ ax . 

Consider the network formed by all published papers 
connected by their citations. The growth, connectivity, 
and statistical properties of this and similar networks 
have been the subject of much recent work^ii To model 
the flow of knowledge on this supporting network^ we 
may assign random flow numbers x^ to all citations, with 
probability P(xij). Flow conservation implies that the 
articles' knowledge-creation values Ci (that I will simply 
call creativities) obey 
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I will discard negative knowledge as meaningless. 14 Thus, 
I will require that Cj > Vz, and reject the sets {:% } 
that violate this conditionpi£*i£ii£ The final values c, will 
then be averages over all valid sets {x^-}, with a relative 
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weight P({xij}) oc exp(-aJ2ij 

Some attention must be paid to the definition of knowl- 
edge that is being used. It might seem that all the knowl- 
edge created by an article must be present already when 
it is published. However, this would make it difficult 
to judge the relative importance of the knowledge cre- 
ated by different papers. Therefore, I rather consider 
the amount of "used knowledge" (and therefore useful). 
The situation is very similar in software development: 
the economic value of a computer library does not ma- 
terialize when it is written, but when licenses of it are 
sold, presumably to create new software (for free software 
we might substitute licenses sold by copies downloaded) . 
Similarly, I am counting every "copy" of the knowledge, 
used in every new paper that cites it (alternatively, one 
might consider the knowledge created by a paper as the 
sum of that added to all the brains that have read it). 

Some of the general qualitative features of the model, 
as an indicator of research merit, may be expected a pri- 
ori: articles with less citations than references will have 
a positive but small creativity value; articles with a large 
output (very cited) and a small input (not many refer- 
ences) will have the largest creativities; in contrast, the 
merit of review articles will be much more moderate than 
that shown by their raw impact factor (citation count 
the differences between the creativities of authors in very 
large and active fields (with large publication and cita- 
tion rates), and those in smaller and less active fields, will 
be largely attenuated, as compared to other merit indi- 
cators, since the basic measure is the difference between 
citations and references, which should be roughly zero in 
all fields; self-citations will be largely discounted, since 
they will count both as a negative contribution (to the 
citing paper) and a positive one (to the cited paper) ; ci- 
tations received from a successful article (i. e. a very cited 
one itself) will be more valuable than those made by a 
poorly cited one^&i^ In particular, citations by uncited 
papers will add no value at all, since no knowledge can 
flow through them; more generally, articles that generate 
a divergent citation tree (e. g. the DNA paper of Watson 
and Crick) will have a large creativity, while those lead- 
ing ultimately to a dead end (e. g. the cold fusion paper 
of Fleischmann and Pons) will have a small one, even if 
they had the same number of direct citations. 



Simplified Model 

The quantitative analysis of the model presented above 
is an interesting challenge that will be addressed in the 
future. In this work, I am rather interested in simplify- 
ing the model to allow the easy generation of a practi- 
cal indicator of merit of research. The simplified model 
will keep many of the general features discussed above, 
though not all (in particular, it will loose the last two 
properties mentioned above). Thus, I propose to trun- 
cate the citation network beyond the first neighbors of 
any given paper, i. e. to consider only its n references 



and m citations, and to impose the conservation of flow, 
Eq. (JIJ, only in the central node i. The average value (x) 
can be used as a convenient unit of knowledge, so that 
a = 1 and P(x) — e~ x . The probability that an article, 
with n references and m citations, has a creativity c is 
then, for n, m > 0: 



P(c\n,m) = N^ 1 



dxi...dx n dyi...dy m 5(c+x—y) e x y 

(2) 



with x = Y^7=i x i an d V = J~)j— i Vji where Xi are the 
put flows (references) and yj are the outputs (citations). 
5(x) is Dirac's delta function, and N is a normalization 
factor given by 



N : 



dxi...dx n dyi...dy m 9{y - x) e x y (3) 



where 9(x) is the step function. Using a convenient 
change of variables, the integrals can be evaluated as 



N 



n— 1 „.m— 1 



dx dy x n y 
o (n-l)l(m-l)! 



% - x) e~*-y (4) 



); P(c\n, m) = N- 1 



n-L.m-1 



dx dy x n y 
(n-l)!(m-l)! 



S(c+x— y) e 
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The result is 



P(c\n, m) 



ne c — m, 2 — n — m; 2c) 



n + m — 1 2-Fi (1) 1 — Tri] I + n; —I) 



(5) 



(6) 



where ±F± and 2F1 are hypergeometric functions, which 
can be expanded as a finite series^ Figure^shows some 
typical probability distributions. 
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FIG. 1: Probability that an article, that has made n — 30 
references and has received m citations, has created a value c 
of scientific knowledge. It was obtained from Eq. @. 



The average value of c, 



c(n,m) = / dc c P(c\n,m), 
Jo 



(7) 
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is, for n, m > 0: 

, , gg 1 ( "r^ )! (fc+i)2 fc 

C[n,m) = 7—1 7T77 — i m • (8) 

v ' ' (n— l)!(n+m— 1)! ^ ' 

2^fc=0 (n+fe)!(m-l-fc)! 

It is represented in figure [5] for some typical values of 
n and to. As expected, c(n, to) increases with m and 




FIG. 2: Circles: mean creation of knowledge (creativity) 
of an article with n references and m citations, calculated 
from Eq. (in units of the mean transmission of knowledge 
reflected by one reference). Solid lines: fits given by Eq. ®. 
Dashed lines: m — n. 

it decreases with n. It obeys c(0, to) = to, c(n, 0) = 0, 
c(n, 1) = 1, and c(n,m) > max(l,m — n) Vto > 0. 
For the present purposes, a reasonably accurate fit is, for 
to > 0: 



c(n, to) ~ to — n + 



^ e az + B e 



f>2 



(9) 



where z = (m - l)/(n + 5), A = 0.986,5 = 0.014, a = 
1.08, and 6 = 6.3. The accumulated creativity of an 
author with N p published papers is then defined as 



C a 



E P c(ni,nii) 



(10) 



i=l 



where is the number of authors of paper i. Notice that, 
being positive and cumulative, C a can only increase with 
time and with the number of published papers. 

In order to find in practice the creativity of an au- 
thor (among many other merit indicators), one can fol- 
low these steps: 1) Download the programs filter and 
merit from this author's web page^^i and compile them 
if necessary. 2) Perform a "General search" in the Thom- 
son ISI Web of Science database^ for the author's name, 
using the appropriate filters. 3) Select the required 
records. Usually the easiest way is to check "Records 
from 1 to last.one" and click on "ADD TO MARKED 
LIST" (if you find too many articles, you may have 
to mark and save them by parts, say (1-500)— >filel, 



(501Tast_one)->file2); 4) Click on "MARKED LIST". 5) 
Check the boxes "Author(s)" , "Title" , "Source" , "key- 
words", "addresses", "cited reference count", "times 
cited" , "source abbrev." , "page count" , and "subject cat- 
egory" . Do not check "Abstract" nor "cited references" , 
since this would slow down considerably the next step. 
6) Click on "SAVE TO FILE" and save it in your com- 
puter. 7) Click on "BACK" , then on "DELETE THIS 
LIST" and "RETURN" , and go to step 2 to make an- 
other search, if desired. 8) If you suspect that there are 
two or more authors with the same name, use the filter 
program to help in selecting the papers of the desired 
author. 9) Run the merit program to find the merit in- 
dicators. Mind for hidden file extensions, possibly added 
by your navigator, when giving file names in this and 
previous step. 



Results and Discussion 

Table [I] shows several indexes of merit of top scien- 
tists in life sciences and physics, taken from Hirsch's 
selection^ It may be seen that the h index of all biol- 
ogists is larger than that of all physicists, and their aver- 
age number of publications and citations is 1.5-2.5 times 
larger. In contrast, the two creativity distributions are 
remarkably similar, with averages that differ only ~ 15%, 
well below the standard deviation of both distributions. 
This offers the promise of direct interdisciplinary compar- 
isons, without any field normalization, a highly desirable 
characteristic of any index of merit. 

Although it is a natural consequence of the idea of 
knowledge flow, the fact that the references of an article 
will result in lowering the merit assigned to it, is admit- 
tedly striking. It is thus appropriate to recognize that 
this is partly due to a deliberate intent of measuring cre- 
ativity rather than productivity (or, in economic terms, 
added value rather than sales). To illustrate the point, 
imagine that two scientists, Alice and Bob, address in- 
dependently an important and difficult problem in their 
field. Bob takes an interdisciplinary approach and dis- 
covers that a method developed in a different field just 
fits their need. Simultaneously, Alice faces the problem 
directly and re-invents the same method by herself (thus 
making less references in her publication)^ All other 
factors being equal, both papers will receive roughly the 
same number of citations, since they transmit the same 
knowledge to their field. But it may be argued that Al- 
ice's work was more creative in some sense, and that her 
skills might possibly (but not necessarily) be more valu- 
able in a given selection process. Eventually, the use- 
fulness of different merit indicators will depend on how 
well they correlate with real human-made selections'^. 
Thus, Tablc^shows also a "productivity index" P a (not a 
probability), given by the author's share of the citations 
received by her/his papers. Notice that, in the model 
proposed, N c is the total output flow of knowledge from 
the author's papers, while P a is her/his share of it. It 
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Name 


N p 


iV c (i0 3 ) 


h 


Pa(10 3 ) 


Ca(10 3 ) 


B. Vogelstein 


447 


144.4 


154 


34.1 


32.0 


S. H. Snyder 


1144 


138.3 


194 


48.2 


38.9 


S. Moncada 


693 


106.2 


145 


32.5 


27.8 


P. Chambon 


987 


98.1 


153 


23.0 


17.7 


R. C. Gallo 


1247 


95.9 


154 


17.9 


13.8 


D. Baltimore 


657 


95.3 


162 


33.0 


28.2 


R. M. Evans 


428 


78.8 


130 


21.2 


18.3 


T. Kishimoto 


1621 


77.5 


134 


14.6 


10.2 


C. A. Dinarello 


992 


74.3 


138 


26.3 


19.2 


A. Ullrich 


615 


73.0 


122 


13.6 


10.9 


Average 


883 


98.2 


149 


26.4 


21.7 


Standard dev. 


364 


24.1 


19 


10.1 


9.1 


P. W. Anderson 


342 


56.7 


96 


39.1 


36.9 


A. J. Heeger 


999 


53.5 


109 


14.2 


10.3 


E. Witten 


254 


53.1 


111 


39.9 


35.9 


S. Weinberg 


444 


38.8 


88 


32.7 


29.3 


M. L. Cohen 


625 


37.4 


94 


14.3 


10.6 


M. Cardona 


1096 


37.0 


88 


12.8 


7.8 


A. C. Gossard 


918 


34.3 


92 


7.4 


5.8 


P. G. deGennes 


358 


32.6 


80 


26.7 


23.9 


M. E. Fisher 


446 


29.8 


88 


19.0 


14.3 


G. Parisi 


469 


24.9 


75 


12.2 


9.9 


Average 


595 


39.8 


92 


21.8 


18.5 


Standard dev. 


286 


10.4 


11 


11.3 


11.3 



TABLE I: Several merit indicators of the ten most cited scien- 
tists in life sciences and physics;^ N p : number of papers pub- 
lished. N c : number of citations received by those papers, h: 
number of papers with h or more citations (Hirsch index) i- P a : 
author's knowledge-productivity index, P a = *}2?i=i rn i/ a ii 
where a, and rrn are the number of authors and of citations 
received by paper i. C a : author's creativity index, Eq. Q1O0 . 
The data were obtained in April 2006. 



may be seen that P a also allows reliable interdisciplinary 
comparisons. It may be concluded that the main differ- 
ence between the two communities is the larger average 
number of authors per article in the life sciences, which 
is taken into account in both P a and C a , but not in the 
other indexes. 

Knowledge-productivity and creativity indicators can 
be used also for groups, institutions, or journals. Thus, 
Table ITT1 shows them for some leading journals. As ex- 
pected, most review journals have considerably smaller 
creativities than productivities (dramatically smaller in 
some cases). Still, Reviews of Modern Physics has the 
largest creativity index of all the journals studied, show- 
ing that collecting, processing, and presenting knowledge 
in a coherent way can by itself create much new useful 
knowledge. 

Finally, in a world of strong competition for positions 
and founds, a negative merit assignment to references 
might result in a tendency to reduce them below what 



Journal 


N p 


Nr/Np N c /N p 


C/N p 


IF 


Nature 


3676 


10 


67 


59 


28.8 


Science 


2449 


14 


74 


63 


24.4 


Rev. Mod. Phys. 


20 


284 


327 


160 


13.4 


Adv. Phys. 


8 


391 


149 


18 


12.7 


Surf. Sci. Rep. 


5 


159 


61 


3 


10.3 


Rep. Prog. Phys. 


29 


198 


90 


32 


6.2 


Phys. Rep. 


81 


166 


90 


22 


5.6 


Phys. Rev. Lett. 


1904 


18 


59 


44 


6.0 


Phys. Rev. D 


1049 


27 


23 


11 


3.9 


Nucl. Phys. B 


620 


37 


42 


24 


3.3 


Appl. Phys. Lett. 


1819 


13 


34 


26 


3.3 


J. Chem. Phys. 


2040 


37 


37 


16 


3.1 


Phys. Rev. B 


3488 


27 


35 


18 


2.8 



TABLE II: Several indicators of some of the main multidisci- 
plinary, review and non-review Physics journals. N p : number 
of "papers" (documents) published in year 1990, in all the 
sections included in the Science Citation Index database. N r : 
number of references made by those papers. N c : number of 
citations received by those papers until May 2006. C: Sum of 
the creativities, Eq. J7J, of those papers, C = Yli=i c(n»,jTii). 
IF: Impact factor in 1998 (center of the period 1990-2006), as 
defined by the Journal of Citation Reportsi- For the non- 
review physics journals (last group), the indicators (other 
than N p and IF) have been obtained from a random sample 
of their Np papers, rather than from the whole set. 



would be scientifically desirable and professionally fair. 
A possible solution is to use, in Eq. (JJJ, a fixed value of 
n (equal to the journal reference intensity, i. e. the aver- 
age number of references per article in that journal), to 
calculate the creativities for competitive-evaluation pur- 
poses. This would spoil a few desirable properties of the 
model (like the discount of self-citations), but most of its 
effects would probably be rather mild, since the number 
of references per paper has a much smaller variance than 
the number of citations. Thus, the root mean squared 
difference between the creativities of Table H] calculated 
using the average references of the journals, rather than 
the actual references of each article, is only ~ 4%. 



Conclusion 

In conclusion, I have proposed an index of research 
merit based on creativity, defined as the creation of new 
scientific knowledge, in a plausible model of knowledge 
generation and transmission. It is calculated easily from 
the citations and references of the author's articles, and 
it is well suited for interdisciplinary comparisons. An 
advantage of such an index is that its meaning may be 
more easily perceived, by policy makers and the general 
public, as a measure of a scientist's social and economic 
service to the community. 
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